Molecular Ecology Resources
○ Wiley
All preprints, ranked by how well they match Molecular Ecology Resources's content profile, based on 161 papers previously published here. The average preprint has a 0.06% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Keene, D.; Arya, S.; Walker, B.; Laumer, C. E.
Show abstract
Molecular data have revolutionised taxonomic and ecological research on the hyperdiverse communities of aquatic benthic microinvertebrates known as meiofauna. However, reference sequence databases remain highly incomplete, with variable barcode genes or fragments studied from taxon to taxon. Furthermore, there is a typical tradeoff between universality of primers and phylogenetic resolution, with rRNA markers being robustly recoverable but failing to resolve species-level divergences, and mitochondrial markers showing the reverse trend. Here, we introduce Oxford Nanopore rRNA and COI amplicon sequencing (OrCa-seq), a rapid, low-cost protocol for parallel long-range PCR amplification and multiplexed sequencing of four amplicons, spanning the nearly-complete rRNA cistron ([~]7-8 kb) and the widely studied Folmer region of COI (represented as overlapping 313 and 658 bp amplicons). This protocol, with its associated bioinformatic workflow, was designed for conducting biodiversity inventories of meiofauna and can be easily carried out in field research and educational contexts, with data available from 96-well plates of specimens within a day of lysis. To validate the method, we processed six plates of student-isolated freshwater and limno-terrestrial meiofauna, characterising the recovery of target genes and taxa with both automated and human-curated BLAST database comparisons. These data demonstrate the universal applicability of OrCa-seq across effectively all meiofauna, including the very smallest species. Nonetheless, recovery efficiency for each amplicon shows variation by taxon, with the full-length Folmer COI amplicon standing out as the most challenging. We present exemplar phylogenetic trees integrating reference sequences, demonstrating the utility of these data in confirming morphological determinations and in identifying anonymous specimens in a reverse taxonomy context. While developed in a specific educational context for use on meiofauna, the OrCa-seq approach should be readily scalable to larger research datasets, adaptable to many specimen types, and to any combination of taxon-or target-specific primers. As such, it represents a compelling multi-locus extension to the ever-growing repertoire of nanopore DNA barcoding protocols.
Elbrecht, V.; Bourlat, S. J.; Hoerren, T.; Lindner, A.; Mordente, A.; Noll, N. W.; Sorg, M.; Zizka, V. M. A.
Show abstract
O_LISmall and rare specimens can remain undetected when metabarcoding bulk samples with a high size heterogeneity of specimens. This is especially critical for malaise trap samples, where most of the biodiversity is often contributed by small specimens. How to size sort and in which proportions to pool these samples has not been widely explored. We set out to find a size sorting strategy that maximizes taxonomic recovery but remains highly scalable and time efficient. C_LIO_LIThree 3 malaise trap samples where size sorted into 4 size classes using dry sieving. Each fraction was homogenized and lysed. The corresponding lysates were pooled to simulate samples never sorted, pooled in equal proportions and in 4 different proportions favoring the small size fractions. DNA from the pooled fractions as well as the individual size classes were extracted and metabarcoded using the FwhF2 and Fol-degen-rev primer set. Additionally wet sieving strategies were explored. C_LIO_LIThe small size fractions harbored the highest diversity, and were best represented when pooling in favor of small specimens. Not size sorting a sample leads to a 45-77% decrease in taxon recovery compared to size sorted samples. A size separation into only 2 fractions (below 4 mm and above) can already double taxon recovery compared to not sorting. However, increasing the sequencing depth 3-4 fold can also increase taxon recovery to comparable levels, but remains biased toward biomass rich taxa in the sample. C_LIO_LIWe demonstrate that size fractionizing bulk malaise samples can increase taxon recovery. The most practical approach is wet sieving into two size fractions, and proportional pooling of the lysates in favor of the small size fraction (80-90% volume). However, in large projects with time constraints, increasing sequencing depth can also be an alternative solution. C_LI
de Flamingh, A.; Ishida, Y.; Pecnerova, P.; Vilchis, S.; Siegismund, H.; van Aarde, R.; Malhi, R.; Roca, A.
Show abstract
Non-invasive biological samples benefit studies that investigate rare, elusive, endangered, and/or dangerous species. Integrating genomic techniques that use non-invasive biological samples with advances in computational approaches can benefit and inform wildlife conservation and management. Here we present a molecular pipeline that uses non-invasive fecal DNA samples to generate low- to medium-coverage genomes (e.g., >90% of the complete nuclear genome at 6X coverage) and metagenomic sequences, combining in a novel fashion widely available and accessible DNA collection cards with commonly used DNA extraction and library building approaches. DNA preservation cards are easy to transport and can be stored non-refrigerated, avoiding cumbersome and/or costly sample methods. The genomic library construction and shotgun sequencing approach did not require enrichment or targeted DNA amplification. The utility and potential of the data generated by this pipeline was demonstrated by the application of genome-scale analysis and metagenomics to zoo and free-ranging African savanna elephants (Loxodonta africana). Fecal samples collected from free-ranging individuals contained an average of 12.41% (5.54-21.65%) endogenous elephant DNA. Clustering of these elephants with others from the same geographic region was demonstrated by a principal component analysis of genetic variation using nuclear genome-wide SNPs. Metagenomic analyses generated compositional taxon classifications that included Loxodonta, green plants, fungi, arthropods, bacteria, viruses and archaea, showcasing the utility of our approach for addressing complementary questions based on host-associated DNA, e.g., pathogen and parasite identification. The molecular pipeline presented here extends applications beyond what has previously been shown for target-enriched datasets and contributes towards the expansion and application of genomic techniques to conservation science and practice.
Rancilhac, L.; Sylvestre, F.; Hutter, C. R.; Arntzen, J. W.; Babik, W.; Crochet, P.-A.; Deso, G.; Duguet, R.; Galan, P.; Pabijan, M.; Policain, M.; Priol, P.; Sabino-Pinto, J.; Capstick, M.; Elmer, K. R.; Dufresnes, C.; Vences, M.
Show abstract
Restriction site-Associated DNA sequencing (RADseq) has great potential for genome-wide systematics studies of non-model organisms. However, accurately assembling RADseq reads into orthologous loci remains a major challenge in the absence of a reference genome. Traditional assembly pipelines cluster putative orthologous sequences based on a user-defined clustering threshold. Because improper clustering of orthologs is expected to affect results in downstream analyses, it is crucial to design pipelines for empirically optimizing the clustering threshold. While this issue has been largely discussed from a population genomics perspective, it remains understudied in the context of phylogenomics and coalescent species delimitation. To address this issue, we generated RADseq assemblies of representatives of the amphibian genera Discoglossus, Rana, Lissotriton and Triturus using a wide range of clustering thresholds. Particularly, we studied the effects of the intra-sample Clustering Threshold (iCT) and between-sample Clustering Threshold (bCT) separately, as both are expected to differ in multi-species data sets. The obtained assemblies were used for downstream inference of concatenation-based phylogenies, and multi-species coalescent species trees and species delimitation. The results were evaluated in the light of a reference genome-wide phylogeny calculated from newly generated Hybrid-Enrichment markers, as well as extensive background knowledge on the species systematics. Overall, our analyses show that the inferred topologies and their resolution are resilient to changes of the iCT and bCT, regardless of the analytical method employed. Except for some extreme clustering thresholds, all assemblies yielded identical, well-supported inter-species relationships that were mostly congruent with those inferred from the reference Hybrid-Enrichment data set. Similarly, coalescent species delimitation was consistent among similarity threshold values. However, we identified a strong effect of the bCT on the branch lengths of concatenation and species trees, with higher bCTs yielding trees with shorter branches, which might be a pitfall for downstream inferences of evolutionary rates. Our results suggest that the choice of assembly parameters for RADseq data in the context of shallow phylogenomics might be less challenging than previously thought. Finally, we propose a pipeline for empirical optimization of the iCT and bCT, implemented in optiRADCT, a series of scripts readily usable for future RADseq studies.
Iwaszkiewicz-Eggebrecht, E.; Granqvist, E.; Nowak, K. H.; Valdivia, C.; Buczek, M.; Srivathsan, A.; Hartop, E.; Miraldo, A.; Roslin, T.; Tack, A. J. M.; Lukasik, P.; Meier, R.; Ronquist, F.
Show abstract
1. DNA metabarcoding--high-throughput sequencing of barcode regions from bulk samples--has become a key tool for insect biodiversity assessment. Yet, how methodological choices affect the accuracy of metabarcoding data remains insufficiently explored. In this paper, we ask: (1) How does the lysis method (non-destructive lysis vs. destructive homogenization) affect community recovery? (2) How comprehensively does metabarcoding capture species richness? (3) To what extent can spike-ins improve abundance estimates? (4) How accurately can species abundances be estimated? 2. We evaluated the accuracy of insect metabarcoding using 4,749 bulk samples from a large-scale biodiversity survey subjected to mild lysis. Of these samples, 856 were also homogenized, allowing a systematic comparison of the effect of alternative treatments. To potentially improve abundance estimates, we added six biological spike-ins (i.e., foreign insects) to all samples, and two synthetic spike-ins (artificial DNA fragments) to the homogenization treatment. In addition, we established the contents of 15 samples by individually barcoding all specimens, enabling direct assessment of occurrence and abundance estimates. 3. Our results revealed consistent differences between destructive and non-destructive treatments. While both methods reliably detected the majority of species, small and soft-bodied taxa were more often recovered after mild lysis than after homogenization, while the reverse was true for heavily sclerotized, hairy, and large taxa. Using biological spike-ins for calibration reduced the variance in read numbers per specimen considerably, especially in homogenized samples, while synthetic spike-ins were less effective. In a Bayesian analysis, where species data were matched to the best-fitting spike-in calibration curve, accurate abundance estimates (+/-1 individual) were obtained for 72.9% of species occurrences. 4. Our results show that it is possible to obtain reasonably accurate abundance estimates from metabarcoding data, and that mild lysis and homogenization result in different taxon-specific biases in terms of occurrence data, with neither method outperforming the other. Accuracy is improved by homogenization rather than mild lysis of samples, and by the use of biological rather than synthetic spike-ins. Together, these findings provide a major step towards robust, quantitative biodiversity monitoring using DNA-metabarcoding.
Rodriguez, L. K.; Schallhart, S.; Hobmeier, P.; Curran, T.; Perez-Jorge, S.; Prieto, R.; Oliveira, C.; Silva, M. A.; Thalinger, B.
Show abstract
O_LIEnvironmental DNA (eDNA) analyses have become a powerful tool for non-invasive biodiversity monitoring, yet the applicability of population genetic approaches to environmental samples remains largely unexplored. Even when genetic traces originate from a single individual, low target DNA concentrations and amplification or sequencing artefacts can compromise downstream genetic inferences. Here, we present a novel approach for obtaining demographic insights and lineage-level mitogenomic information from aquatic eDNA samples collected near vertebrate individuals. C_LIO_LIPaired eDNA and tissue samples were collected during sperm whale (Physeter macrocephalus) encounters in the Azores. Samples were screened for the presence of vertebrate eDNA and analyzed with a novel molecular sex identification assay. Additionally, long-range PCR was used to amplify up to five mitochondrial DNA fragments ([~]3-4k bp) before subsequent sequencing on an Oxford Nanopore Technologies platform. A stringent three-tier filtering framework capable of identifying true mitogenomic variation across eDNA samples was developed for maximum recovery of genetic diversity at the haplogroup level. By benchmarking eDNA samples via their paired tissues, parameter values were optimized to maximize concordance and minimize spurious variant calls. C_LIO_LISexing was successful for 50% of eDNA samples, with 96% concordance to paired tissues, and marine vertebrate DNA concentration significantly predicted sexing success. Further, Medaka polishing produced high identity mitochondrial consensus sequences (>16 kb) from eDNA samples. Across filtering regimes in the framework, curated SNP panels comprising up to 453 high-confidence mitochondrial SNPs resolved 19 haplogroups, with 93% concordance between eDNA and tissue samples. An intermediate bioinformatics filtering strategy maximized biologically accurate haplogroup recovery while minimizing sequencing artefacts, providing the most reliable lineage-level inferences. C_LIO_LIThis integrative approach demonstrates that targeted nuclear assays combined with long-range mitochondrial sequencing can recover individual-level genetic information from aquatic eDNA. By defining analytical thresholds governing success, the framework advances non-invasive genetic monitoring of populations via eDNA and enables population-level monitoring and conservation of endangered and genetically-vulnerable species. C_LI
Armstrong, E. E.; Li, C.; Campana, M. A.; Ferrari, T.; Kelley, J. L.; Petrov, D.; Solari, K. A.; Mooney, J. A.
Show abstract
Despite substantial reductions in the cost of sequencing over the last decade, genetic panels remain relevant due to their cost-effectiveness and flexibility across a variety of sample types. In particular, single nucleotide polymorphism (SNP) panels are increasingly favored for conservation applications. SNP panels are often used because of their adaptability, effectiveness with low-quality samples, and cost-efficiency for use in population monitoring and forensics. However, the selection of diagnostic SNPs for population assignment and individual identification can be challenging. The consequences of poor SNP selection are under-powered panels, inaccurate results, and monetary loss. Here, we develop a novel user-friendly SNP selection pipeline for population assignment and individual identification, mPCRselect. mPCRselect allows any researcher, who has sufficient SNP-level data, to design a successful and cost-effective SNP panel for species of conservation concern.
Gautier, M.; Coronado-Zamora, M.; Vitalis, R.
Show abstract
Introduced over seventy years ago, F -statistics have been and remain central to population and evolutionary genetics. Among them, FST is one of the most commonly used descriptive statistics in empirical studies, notably to characterize the structure of genetic polymorphisms within and between populations, to shed light on the evolutionary history of populations, or to identify marker loci under differential selection for adaptive traits. However, the use of FST in simplified population models can overlook important hierarchical structures, such as geographic or temporal subdivisions, potentially leading to misleading interpretations and increasing false positives in genome scans for adaptive differentiation. Hierarchical F -statistics have been introduced to account for multiple predefined levels of population structure. Several estimators have also been proposed, including robust ones implemented in the popular R package hierfstat. Nevertheless, these were primarily designed for individual genotyping data and can be computationally intensive for large genomic datasets. In this study, we extend previous work by developing unbiased method-of-moments estimators for hierarchical F -statistics tailored for Pool-Seq data, a cost-effective alternative to individual genome sequencing. These Pool-Seq estimators have been developed in an anova framework, using definitions based on identity-in-state probabilities. The new estimators have been implemented in an updated version of the R package poolfstat, together with estimators for sample allele count data derived from individual genotyping data. We validate and compare the performance of these estimators through extensive simulations under a hierarchical island model. Finally, we apply these estimators to real Pool-Seq data from Drosophila melanogaster populations, demonstrating their usefulness in revealing population structure and identifying loci with high differentiation within or between groups of subpopulations and associated with spatial or temporal genetic variation.
Wolany, L.; Klinkenborg, K.; Leese, F.; Buchner, D.
Show abstract
DNA metabarcoding is a central tool in biodiversity research and monitoring, producing detailed taxa lists with comparatively little time and effort. One of its limitations, however, is the lack of quantitative data on biomass or abundance. This limitation has two main reasons: 1) template copy number variation and 2) primer-induced amplification bias. Many metabarcoding markers are mitochondrial and mitochondrial copy numbers vary in animal tissues, potentially decoupling sequence counts from biomass. Additionally, primer mismatches can lead to taxon-specific amplification biases, for which PCR cycle calibration has been proposed as a solution. To mechanistically study both effects, we constructed mock communities of different arthropod species. We combined digital droplet PCR and COI metabarcoding to quantify relationships between biomass, mitochondrial copy number and metabarcoding reads. Mitochondrial DNA copy numbers per biomass varied strongly within and among the different taxa. Metabarcoding reads did not reflect input mitochondrial DNA copies without a correction. Attempts to correct for amplification bias via PCR cycle calibration failed as read proportions remained stable across cycles. We therefore mathematically derived an approach to estimate relative amplification bias and initial mitochondrial DNA copy numbers in a sample based on a non-exponential amplification bias model and demonstrate its applicability. Still, the detected high variation in mitochondrial copy numbers and derived prerequisites necessary to calculate amplification efficiencies and mitochondrial copy numbers limit the practical application. Our study highlights fundamental constraints of quantitative metabarcoding and underscores the need for additional methodological approaches for quantitative insights while delivering essential conceptual insights.
ZELVELDER, B.; BENOIT, L.; LOISEAU, A.; HARAN, J.; ALLIO, R.
Show abstract
Target enrichment methods have provided unprecedented advances in phylogenomics. Targeting hundreds of conserved regions has proven to be a good tradeoff between cost and efficiency, while being useful for museomics and diversified non-model clades. Unfortunately, current methods used for identifying such regions involve high degrees of conservation within targeted elements, usually pushing researchers to rely on flanking data with little guarantee for homology. With a growing number of high quality genomes available throughout the Tree of Life emerges new opportunities to improve marker selection. In this study, we introduce GABBI, a new method for designing target capture probes by taking advantage of genome alignments, avoiding the selection of a single reference genome that can cause notable biases. We compare GABBI-derived markers to the most commonly used probe design method, PHYLUCE, at two taxonomic scales, the weevil superfamily Curculionoidea and the tribe Pachyrhynchini. At both taxonomic scales, results show that our new method allows identifying more variable loci that prove to be more phylogenetically resolutive than the PHYLUCE-derived ones. Doing so, we provide the first probe set specifically designed for weevils, targeting a wide set of 4,255 shared homologous regions, encouraging future research on systematics and macroevolution of one of the most diverse and economically important groups of insects. By providing GABBI as an automated and open-access pipeline, we hope to open new probe design opportunities to other taxonomic groups that face similar phylogenetic obstacles.
Ollivier, M.; Marquisseau, A.; Dufrene, E.; Rudelle, R.; The CODABEILLES Consortium, ; Rougerie, R.; Perrard, A.; Pichon, M.
Show abstract
In the Anthropocene, the decline of insect pollinators poses a significant threat to ecosystem services, particularly to wild bee populations essential for plant biodiversity and agricultural productivity. France, with 983 species, hosts one of the most diverse bee faunas in Europe, yet these species face growing pressures from habitat loss, climate change, and intensive agriculture. Addressing this crisis requires robust taxonomic frameworks and efficient species identification methods to support long-term monitoring initiatives such as the European Pollinator Monitoring Scheme, EU-PoMS. DNA barcoding, utilizing the COI-5P gene, has proven effective for species delineation and biodiversity monitoring, particularly in detecting cryptic diversity among genera with large numbers of species such as Andrena, Nomada or Lasioglossum. However, significant gaps remain in reference libraries, particularly for the species from the Mediterranean Basin. To bridge this gap, the CODABEILLES initiative was launched in 2021 to enhance barcode data for the French bee fauna. Initially, only 25% of species had barcodes from French voucher specimens, increasing to 62% when considering voucher specimens from other countries. By 2025, thanks to collaboration with sixteen specialists and institutions, CODABEILLES contributed 1477 reference barcodes, covering approximately 560 species and raising barcode coverage to 82%. When integrating data published under other initiatives over the same period the coverage reaches 94% of the French bee fauna. This dataset significantly enhances species identification accuracy and supports large-scale pollinator monitoring through metabarcoding and environmental DNA approaches. Despite the success of COI-5P barcoding, taxonomic inconsistencies persist, necessitating further integrative research. This study underscores the need for continued collaboration among taxonomists, molecular biologists, and conservationists to refine species classifications and ensure comprehensive reference databases. The improved barcode coverage provided by CODABEILLES paves the way for more accurate DNA-based monitoring of wild bee populations and their ecological interactions, crucial for guiding conservation strategies in the face of ongoing environmental change.
Brandao-Dias, P. F.; Guri, G.; Shaffer, M.; Allan, E. A.; Kelly, R. P.
Show abstract
Environmental DNA (eDNA) metabarcoding provides powerful insights into species presence and community composition, but remains limited in its ability to quantify species abundance or structure. Here, we show that deviation between observed haplotype frequencies within a given sample and the population haplotype frequencies can be used to infer the number of individual contributors to an eDNA sample. We also lay out the theory for how population haplotype frequencies can be approximated from eDNA data alone, enabling broad applicability even in the absence of tissue-based references. We then present an estimator to derive the number of individual contributors to a given eDNA sample and validate its performance using simulations with variable allele frequencies and noise. Our framework demonstrates that differences between expected and observed frequencies carry meaningful biological information in eDNA data. Our results show that the number of contributors can be recovered under a range of conditions, particularly with hypervariable markers and sufficient sampling. This approach complements existing molecular methods and opens a new avenue for inferring abundance from eDNA metabarcoding datasets.
Kuijk, J.; van den Burg, M.; Didaskalou, E.; de Boer, M.; Debrot, A.; Wielstra, B.; Stewart, K. A.
Show abstract
Reptiles have among the highest extinction risk across terrestrial vertebrates, with habitat fragmentation, habitat destruction, and invasive alien species being the primary causes of reptile species loss on a global scale. Invasive hybridization (i.e. hybridization between native and invasive alien species) is increasing globally, causing the extinction of native genotypes, and this phenomenon is particularly pervasive in Caribbean iguanas. The Lesser Antillean Iguana (Iguana delicatissima), a keystone species of Caribbean coastal ecosystems, has become critically endangered mainly due to ongoing hybridization with the invasive Common Green Iguana (I. iguana). For impactful conservation intervention, the need for early detection of invasive animals and their progeny, or detection of surviving pure native animals, is urgent. We aimed to develop a novel environmental DNA (eDNA) toolkit using Kompetitive Allele Specific PCR (KASP) technology, a method of allele-specific amplification for cost-effective and efficient sampling of terrestrial substrates to aid in mapping the distribution of native I. delicatissima, invasive I. iguana, and signal potential invasive hybridization. We demonstrate proof-of-concept and successfully identified I. delicatissima, I. iguana, and their hybrids via blood samples using our primer sets, as well as successful detection of I. delicatissima in several ex-situ (Rotterdam Zoo) and in-situ (St. Eustatius) eDNA samples, collected with environmental swabs and tape-lifting. We found that sampling potential perching spots yielded the highest number of positive detections via environmental swabbing and tape-lifting. Our toolkit demonstrates the potential of terrestrial eDNA sampling for iguana conservation, enabling faster detection of potential invasive hybridization. Additionally, the method holds promise for other terrestrial cryptic species, contributing to broader collection of population-level information.
Jecha, K.; Lavanchy, G.; Schwander, T.
Show abstract
Advancements in genetic technologies have allowed us to generate large data sets relatively quickly and easily. However, without proper quality control checks, the inferences drawn from such data can be erroneous and go on to misinform further studies. DNA contamination between focal samples of the same or closely related species can have major impacts on downstream analyses, but their presence is seldom tested. Here, we created a pipeline combining competitive mapping to remove reads from intergeneric contamination, followed by a filtering method using allelic depth ratio frequencies to exclude intrageneric contamination. We then used a RADseq dataset of over 1,000 Swiss Lasius ants that were cross contaminated to various levels prior to sequencing to assess the impact of contamination on inferences of introgression. The original dataset presented widespread introgression between species in which hybridization has never been recorded. After thorough decontamination, we found only one individual with a strong signature of introgression, between the species L. emarginatus and L. platythorax, revealing that introgression is extremely rare in this genus. Implementing our method of filtering can significantly improve the robustness of biological findings based on genomic datasets. We recommend that systematically checking for the presence of cross contamination should be a key step in the preprocessing of genomic datasets.
Melendez, D.; Sapci, A. O. B.; Bafna, V.; Mirarab, S.
Show abstract
Ultraconserved elements (UCEs) provide ideal candidates for targeted sequencing and cost-effective acquisition of genome-wide data. While UCEs have been widely used in phylogenetic studies to recon-struct evolutionary relationships, their use in population-level research has been limited. This limited application stems from uncertainty over whether UCEs can capture the levels of genetic variation needed to answer population genomic questions central to ecology and biodiversity research. The concern is that, by definition, UCEs are highly conserved and may therefore lack sufficient within-species variation. The more variable flanking regions (400-750 bp from the UCE core) contain informative polymorphisms, though diversity decreases near the core. Thus, any naive estimator of genetic diversity that ignores this conservation will have an underestimation bias. In this paper, we introduce SPrUCE: Sigmoid Pi requiring UCEs, a reference-free method that estimates nucleotide diversity{pi} from aligned UCE data. SPrUCE corrects underestimation bias by modeling the change in diversity away from the UCE core using a Gompertz function. The model accounts for the bias introduced by the conserved core and allows for more accurate per-site diversity estimates. We tested SPrUCE on UCE alignments from a range of taxa, including invertebrates and vertebrates (finches, honeybees, sheep, and smelt). SPrUCE produces diversity values consistent with whole-genome derived estimates that require an assembled reference. It is fast, scalable, and effective even with missing data. Its modeling approach enables accurate population-level assessments of genetic diversity, offering a new and reliable option for conservation and population genetics.
Urban, L.; Miller, A. K.; Eason, D.; Vercoe, D.; Shaffer, M.; Wilkinson, S. P.; Jeunen, G.-J.; Gemmell, N. J.; Digby, A.
Show abstract
We used non-invasive real-time genomic approaches to monitor one of the last surviving populations of the critically endangered k[a]k[a]p[o] (Strigops habroptilus). We first established an environmental DNA metabarcoding protocol to identify the distribution of k[a]k[a]p[o] and other vertebrate species in a highly localized manner using soil samples. Harnessing real-time nanopore sequencing and the high-quality k[a]k[a]p[o] reference genome, we then extracted species-specific DNA from soil. We combined long read-based haplotype phasing with known individual genomic variation in the k[a]k[a]p[o] population to identify the presence of individuals, and confirmed these genomically informed predictions through detailed metadata on k[a]k[a]p[o] distributions. This study shows that individual identification is feasible through nanopore sequencing of environmental DNA, with important implications for future efforts in the application of genomics to the conservation of rare species, potentially expanding the application of real-time environmental DNA research from monitoring species distribution to inferring fitness parameters such as genomic diversity and inbreeding.
van Berkel, D.; Breve, N.; de Boer, M.; Reynaud, E.; Nijland, R.
Show abstract
Molecular techniques involving environmental DNA (eDNA) are increasingly used for aquatic species detection. Metabarcoding, a widely adapted technique, suffers from primer bias: uneven amplification of species due to primer mismatches. The primer bias can be eliminated by omitting PCR, thereby sequencing all eDNA in a sample. This method, known as metagenomics, offers potential benefits for relative abundance estimates and epigenetic modifications, but is seldom applied to eukaryotic communities and eDNA. This study uses an expanded two-by-two design to compare fish species detection between multi-marker metabarcoding and metagenomics using two filter types (conventional versus high-flow). Environmental DNA was collected in a controlled setup and two field settings, which contained several fish species including European sturgeon (Acipenser sturio). Moreover, we explore methylation patterns obtained from nanopore native sequencing. All species present in the controlled environment were detected using both metabarcoding and metagenomics. In field settings, metagenomics detected more species than metabarcoding. High-flow filters recovered more species across all sequencing datasets, except in metabarcoding of field settings. Relative read counts between metabarcoding and metagenomics illustrate primer bias is present in the used primer sets. Most fish metagenomic sequences were identified as A. sturio across all eDNA samples. We observed three base modifications on the 18S region of A. sturio, where three sites showed different methylation patterns between eDNA samples. Our results demonstrate that metabarcoding and metagenomics function complementary in species detection and metagenomics provides additional insights into base modifications. Moreover, high-flow filters offer strong potential for improved species detection in various environments.
Hong, A.; Cheek, R. G.; Mukherjee, K.; Yooseph, I.; Oliva, M.; Heim, M.; Funk, W. C.; Tallmon, D.; Boucher, C.
Show abstract
O_LIThe genetic effective size (Ne) is arguably one of the most important characteristics of a population as it impacts the rate of loss of genetic diversity. Genetic estimators of (Ne) increasingly popular tools in population and conservation genetic studies. Yet there are very few methods that can estimate the Ne from data from a single population and without extensive information about the genetics of the population, such as a linkage map, or a reference genome of the species of interest. C_LIO_LIWe present ONeSAMP 3.0, an algorithm for estimating Ne from single nucleotide polymorphism (SNP) data collected from a single population sample using Approximate Bayesian Computation and local linear regression. C_LIO_LIWe demonstrate the utility of this approach using simulated Wright-Fisher populations, and empirical data from five endangered Channel Island fox (Urocyon littoralis) populations to evaluate the performance of ONeSAMP 3.0 compared to a commonly used Ne estimator. Our results show that ONeSAMP 3.0 is robust to the number of individual samples and number of loci included in and appears accurate even if the range of true Ne values is large. C_LIO_LIThis method is broadly applicable to natural populations and is flexible enough that future versions could easily include summary statistics appropriate for a suite of biological and sampling conditions. ONeSAMP 3.0 is publicly available under the GNU license at https://github.com/AaronHong1024/ONeSAMP_3 and also available with Bioconda (https://bioconda.github.io/index.html). C_LI
Pavinato, V. A. C.; Wijeratne, S.; Spacht, D.; Denlinger, D. L.; Meulia, T.; Michel, A. P.
Show abstract
The sequencing of whole or partial (e.g. reduced representation) genomes are commonly employed in molecular ecology and conservation genetics studies. However, due to sequencing costs, a trade-off between the number of samples and genome coverage can hinder research for non-model organisms. Furthermore, the processing of raw sequences requires familiarity with coding and bioinformatic tools that are not always available. Here, we present a guide for isolating a set of short, SNP-containing genomic regions for use with targeted amplicon sequencing protocols. We also present a python pipeline--PypeAmplicon-- that facilitates processing of reads to individual genotypes. We demonstrate the applicability of our method by generating an informative set of amplicons for genotyping of the Antarctic midge, Belgica antarctica, an endemic dipteran species of the Antarctic Peninsula. Our pipeline analyzed raw sequences produced by a combination of high-multiplexed PCR and next-generation sequencing. A total of 38 out of 47 (81%) amplicons designed by our panel were recovered, allowing successful genotyping of 42 out of 55 (76%) targeted SNPs. The sequencing of [~]150 bp around the targeted SNPs also uncovered 80 new SNPs, which complemented our analyses. By comparing overall patterns of genetic diversity and population structure of amplicon data with the low-coverage, whole-genome re-sequencing (lcWGR) data used to isolate the informative amplicons, we were able to demonstrate that amplicon sequencing produces information and results similar to that of lcWGR. Our methods will benefit other research programs where rapid development of population genetic data is needed but yet prevented due to high expense and a lack of bioinformatic experience.
Landis, J. B.; Hufnagel, E.; Felton, J. M.; Harden, J. J.; Almeida, D.; Specht, C. D.
Show abstract
Recent advancements in next generation sequencing approaches allow for expansion of evolutionary research into the discovery of genetic patterns and processes underlying diversification across scales. The increased popularity of the Element Bioscience AVITI platform, partially due to the high sequencing accuracy and low cost of reagents, is becoming a viable alternative approach for generating massive amounts of comparative sequencing data across diverse organismal lineages. Using a data set of five accessions from the monocot genus Costus, we tested miniaturization conditions for generating robust, cost-effective libraries and made comparisons of data generated by AVITI and Illumina sequencing platforms to investigate the potential for combining data for population genomic and phylogenomic analyses. Our results show that the AVITI and Illumina data sets are highly congruent in terms of inferring overlapping SNPs, with only a small fraction picked up by only one of the two platforms. The rates of duplication in miniaturized libraries were much higher than in full volume libraries and in the Illumina libraries, resulting in missing SNPs and less sequence coverage when volumes are reduced. For all generated libraries, most downstream evolutionary analyses, including clustering algorithms (such as PCA) and phylogenetic inference, yielded similar results. However, Structure analyses were less consistent across datasets, with data from the most miniaturized libraries being assigned to the wrong clusters. The AVITI platform should be seen as a cost-effective approach for generating genomic data for comparison across taxonomic lineages, even for ongoing projects where Illumina data already exists.